QLISPP.TEX[PRO,JMC] - www.SailDart.org

perm filename QLISPP.TEX[PRO,JMC] blob sn#861118 filedate 1988-09-15 generic text, type C, neo UTF8

COMMENT ⊗ VALID 00007 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 % Qlisp proposal First 18mo
C00003 00003 % \input cover.
C00008 00004 % \input prop.
C00031 00005 \section{Parallel Algorithms for Algebraic Manipulation}
C00040 00006 % Appendix B -- separately TEXed QLISP.TEX
C00041 00007 \bye
C00042 ENDMK
C⊗;
% Qlisp proposal First 18mo

\input cltex.fix[tex,clt]

\cltpage{\magstep1}

\headline={\hfil} % headline is normally blank
\footline={\hss\tenrm\folio\hss}
\plainpages
\overfullrule=0pt

\def\chapFont{\twelveit}
\def\sectFont{\tenbf}
\def\ssectFont{\tenbf}
\def\sssectFont{\tenbf}

\normalsize
\raggedbottom

% \input cover.
\vsize=9.0 true in
\voffset= -.25 true in
\begintitle
\vfill
\centerline{
\vbox{\ialign{\hfil # \hfil\cr
Stanford University\cr
Department of Computer Science\cr
proposal to\cr
Defense Advanced Research Projects Agency\cr
on\cr
\bf Qlisp for Parallel Processors\cr
\noalign{\vskip 0.1 true in}
Prof. John McCarthy, Principal Investigator\cr
}}
}
\noindent
Summary:

Initiation of a research project in parallel processing is proposed
to demonstrate the feasibility of
an extension of Common Lisp called Qlisp, as described in
Gabriel \& McCarthy, 1984.
An initial eighteen month task
is to be undertaken at a cost of \$2.3 million.
It is planned to use queue-based multi-processing with the following features.

\item {$\bullet$} Each processor can address the whole of memory, and a processor may
execute programs anywhere in memory on data located anywhere in memory.

\item {$\bullet$} The programmer indicates when parallelism is possible by using parallel
constructions in the source language, which is an extension of Lisp.

\item {$\bullet$} When a program comes to a statement allowing parallelism and decides
(according to the computed value of an allow-parallelism parameter
in the statement) that parallelism is to be invoked, it adds a
collection of tasks to a queue and starts on the first.

\item {$\bullet$} When a processor completes a task it goes to the queue for its next task.

Basing parallelism on run-time queues means that a program isn't written
or compiled for any specific number of processors. The number available
can even change during the course of a computation. Tasks need not be of
similar length, because a processor finishing a short task merely takes
another from the queue.

This project will involve a collaboration between Stanford University and
its proposed subcontractor, Lucid, Inc., together with the
University of California at Berkeley.
The latter organization will seek separate funding from DARPA for its
contribution to to this project.
An initial task of 18 months duration involving Stanford and Lucid,
described and budgeted in this proposal, will produce a functioning Qlisp
system, documentation and debugging tools.

This is proposed as a step toward the longer range goal of demonstrating
symbolic computation in a parallel computing system. The follow-on task,
to be proposed for funding later, will apply, evaluate, and tune the Qlisp
system on demonstration tasks in symbolic computation. For comprehensibility,
this proposal describes both tasks.

\endtitle
% \input prop.
\section{Introduction}
This is a proposal for research in implementing and demonstrating
applications of an extension of Lisp for parallel processing called
Qlisp.\footnote {The scheme under discussion was called ``Qlambda'' in
Gabriel and McCarthy (1984).
An extension of that paper is included as Appendix B.
The name ``Qlambda'' was used instead of ``Qlisp'' because there had been an earlier
system called Qlisp developed by Earl Sacerdoti.
In a recent discussion, Dr. Sacerdoti indicated that the earlier Qlisp
has been out of use long enough that he thinks the likelihood
of confusion arising from reuse of the name is small.
Therefore we are adopting the preferred name of ``Qlisp.''}
Qlisp (Gabriel and McCarthy 1984) allows the programmer to
specify tasks that may be performed in parallel by creating queues of
tasks. The project involves acquiring a parallel processor, developing
Qlisp for it, developing test applications for it (algorithms for algebraic
mathematics comparable to those of
Macsyma), and testing the performance of the applications on substantial
symbolic computation tasks.

The project will be carried out with both formal and informal
collaboration among several groups. The principal investigator for the
overall project will be Professor John McCarthy. The Qlisp
implementation will be done under subcontract to Lucid Inc. The design
and implementation of Qlisp will be supervised by Richard P. Gabriel of
Lucid and Stanford. The Macsyma application will be developed in
collaboration with Professor Richard Fateman of the University of
California at Berkeley. Separate proposals for the Lucid subcontract and
Fateman's contribution to the Macsyma application are attached.
Development of parallel programming tools and programs for testing
applications will be supervised by Carolyn Talcott. We expect to interact
substantially with other symbolic parallel processing projects at Stanford
and Berkeley. This interaction is outlined below. The parallel facility
and Qlisp will be available on the ARPAnet for others to try out.

It is generally agreed that the main hope for large increases in computer
speed, whether for numerical work or symbolic computation as in artificial
intelligence, lies in massive parallelism. Projects are being undertaken
that will involve hundreds or even thousands of processors. However,
no one has yet demonstrated the ability to make general purpose use of
even two processors on symbolic computation for a single problem. Indeed
even numerical computation on parallel processors has been difficult, and
there are no general purpose computation systems that make good use of
parallel processors in spite of more than twenty years of work.
Therefore, it is important to explore a variety of approaches to getting
good performance from parallel processors on a variety of problems.

Our approach is queue-based multi-processing. It has the following
features.

\textindent{1.}
Each processor can address the whole of memory, and a processor
may execute programs anywhere in memory on data located anywhere in memory.
This makes difficulty for the hardware designer, but we are running scared
on programming, and we cannot necessarily accept the forms of parallelism
that hardware people find most convenient to implement.

\textindent{2.}
The programmer indicates when parallelism is possible by using
parallel constructions in the source language, which is an extension of
Lisp. See Gabriel and McCarthy 1984, an extension of which is included as
Appendix B, for a description of this language called Qlisp.

\textindent{3.}
When a program comes to a statement allowing parallelism and
decides (according to the computed value of an allow-parallelism
parameter in the statement) that parallelism is to be invoked, it adds a
collection of tasks to a queue and starts on the first. When a processor
completes a task it goes to the queue for its next task. It may execute
some queue management code to decide what to do next.

\textindent{4.}
Basing parallelism on run-time queues means that a program isn't
written or compiled for any specific number of processors. The number
available can even change during the course of a computation. Tasks need
not be of similar length, because a processor finishing a short task merely
takes another from the queue.

\textindent{5.}
While Qlisp is more fully described in the paper, we mention
just the statement {\tt QLET} here. Its form is
%
$$\vbox{\openup\jot\ialign{& #\hfil\cr
(\tt QLET\ &\it allow &(({\it x1} {\it arg1})) \cr
& &\hfil\dots \cr
& &\hfil\dots \cr
& &\hfil\dots \cr
& &(({\it xn} {\it argn}))
&.\ {\it body}).\cr
}}
$$
%
The parameter {\it allow} is evaluated first. If its value is
(), i.e. false, parallelism is not allowed and the {\tt QLET} statement
behaves like an ordinary Common Lisp {\tt LET}. If the value is something
else and not {\tt EAGER}, then a queue of tasks is set up to evaluate
{\it arg1 ... argn,} and the processor takes a task from the queue. If
the value is {\tt EAGER}, the processor sets up the queue and immediately
starts on evaluating {\it body}, suspending if it comes to a
variable whose value has not yet been computed.

The point of exhibiting {\tt QLET} here is that the parameter
{\it allow} is evaluated to determine whether parallelism is allowed.
Because Lisp (and symbolic computation generally) is highly recursive,
it can generate a large number of parallel tasks. However, we need
only enough parallelism to keep our processors busy. Therefore,
the expression {\it allow} should often be false. Qlisp is designed
according to the idea that it is necessary to limit parallelism as
well as to instigate it.

We believe that it is necessary to follow through on the
development and implementation of Qlisp by trying it out on
substantial applications. These trials will most likely lead to
improvements in Qlisp and in the parallel processing hardware.
They may also determine whether it is possible to relax the requirement
that all memory be equally accessible to all processors, since
the hardware people find this expensive to implement.

There are two kinds of problems that to which parallelism
can be applied. In one case the goal is make what you already have run
faster. This improves interaction with the computer and allows larger
problems to be addressed. In the second case, speed is of the essence.
Part of of the algorithm is devoted to insuring speed and some sort of
``real time'' behavior. For example, a process may want to time out or
use some default value if a subprocess does not reply within a given time
limit. Qlisp primitives directly address problems of the first sort.
Additional primitives may be needed for the real-time applications. For
this a number of questions will need to be answered. Can the additional
primitives can be coded in Qlisp? Can they be coded easily and
efficiently?

We have chosen a Macsyma-like system as an initial target application. This is an application of the first sort.
Our reasons for choosing this rather than an AI application are the following.

\textindent{1.}
Macsyma and similar systems like REDUCE and SMP use a wide variety
of algorithms each with its own inner loop and each of which presents its
own problems of parallelism.

\textindent{2.}
When a sufficient number of features have been implemented,
we can compare performance with algebraic computation systems running
on single processors.

\textindent{3.}
The task is definite enough to be readily defined and supervised.
Some of our group are eager to undertake it.

We are also thinking about possible AI applications, e.g. to problem solving,
but are not ready to decide on one.

\section{ Tools for parallel symbolic programming}

The work proposed here involves an entirely new style of programming and
will require the development of new methods and tools to support the
programming activity. This development will be carried out in parallel
with and in support of the chosen application.

\subsection{Algorithm design}
Part of the development of parallel lisp involves discovering what
primitives are needed to carry out various tasks. This involves
classifying problems as to the sort of parallelism required and the design
of control structures appropriate for the various classifications. Two
well known classifications are AND versus OR parallelism and asynchronous
tasks with occasional interactions versus synchronous tasks operating on a
shared data structure. Somewhat newer ideas include pipelines and other
geometric structures that make it easy to visualize the interactions among
a set of processes. These latter ideas are discussed in
Appendix B. In some cases successful use of parallelism will
require design of data structures with information making them suited to
parallel processing.

\subsection{Methods for debugging.}
Traditional Lisp debugging methods such as stepping and tracing are not
going to be adequate for debugging Qlisp programs. Programmers will
need to learn where to look for bugs and to devise methods for identifying
meaningful checkpoints. We will build tools for for monitoring the
progress of a set of processes working on a problem. We will also work on
informal methods of specification and reasoning about Qlisp programs as
aids to the process of program development and testing.

\subsection{ Benchmarks and measures of success.}
We will design benchmarks for testing parallel lisps and for comparing
the relative efficiency of parallel and sequential algorithms. We will
also develop tools for measuring factors such as degree of parallelism
achieved, queue management overhead, and memory contention. Some
measuring tools have already been developed by Gabriel in a sophisticated
system for simulating execution of Qlisp programs. This system was used
in initial experiments testing Qlisp primitives.

\section{Facilities and Support}

In order to develop and test the scheme outlined above, it will be
necessary to have access to machines that execute multiple instruction
streams concurrently. There are a number of novel machines with parallel
architectures in various stages of development currently and it is not
clear which approaches will emerge as the most cost/effective in the long
run.

In view of this, we propose to acquire a system that is commercially
available and relatively mature, both in hardware and in operating system,
so that our parallel software development efforts can proceed with
relatively little impairment from developmental bugs in the basic hardware
and operating system. It will not be necessary to acquire a functionaly
complete computer system, including all necessary peripherals---existing
computer facilities at Stanford, including SAIL and SCORE computers, can
provide some of these functions.

We propose that a substantial portion of the developmental effort be
undertaken by Lucid under a subcontract, as discussed below. [We
recommend that a sole source contract be negotiated with Lucid, Inc.
because Richard Gabriel, who is a founder and President of Lucid, is the
co-inventor of Qlisp.]

For the implementation task, we plan to acquire one multiprocessor system
and make it accessible to all participants via communication links.
Toward this end, we have budgeted for a microwave link between Lucid's
offices and the Stanford campus ethernet, which will also provide linkage
to ARPAnet. For the follow-on application and testing task it may be
necessary to acquire an additional system.

We have carefully examined a number of possible system configurations from
a number of prospective suppliers, including BBN, Denelcor, Encore,
Sequent, Symbolics, and Synapse. It appears that several machines that
could meet our needs are available. We propose to choose the most
suitable system available at the time funding becomes available. The
budget has been formulated based on the Sequent Balance 8000, which is one
of the machines that appears suitable, but we are leaving the final
selection open for now.

\section{ Collaboration and connections with other projects }

There are several other groups working on aspects of parallel symbolic
computation with whom collaboration will have mutual benefit. Two
particular projects are:

\noindent
(HPP-AA) The advanced architectures group from the heuristic programming
project as Stanford (Feigenbaum, Nii)

\noindent
(SPUR) The parallel Lisp on a RISC machine project at UC Berkeley

Some obvious benefits come from the opportunity to share ideas
on real parallel programming techiques and the problems to be solved.
In addition there are some specific benefits.

The advanced architecture project will benefit from having a parallel
Lisp running on real machine rather than using simulations. They are
working on parallel implementation of the blackboard model for expert
systems which will provide an additional substantial application with
emphasis on AI methods and asynchronous interactions. This will provide
input on additional primitives needed in Qlisp and programs for testing
implementation of these primitives. There is already some on-going
collaboration with regard to designing parallel control structures and
testing them in a simulation of Qlisp.

The SPUR project will get additional applications for testing their
primitives including benchmarks designed for Qlisp and for the Macsyma
application. There can be mutual input on parallel primitives at
software, operating system and hardware levels. In a later stages,
assuming good progress on the part of both projects, it will be reasonable
to implement Qlisp on the SPUR machine. Collaboration with the
SPUR group will be facilitated by our collaboration with Fateman on the
Macsyma application.

\nextpage
\section{Parallel Algorithms for Algebraic Manipulation}

{\it This part of the project, is proposed by Richard Fateman
and his group at U.C. Berkeley, who will request separate funding.
This proposed work
is not included in the attached budget but is described here for completeness,
given that it is a task within this project.}

We propose to design, implement, and test algorithms in Qlisp
for the purpose of
(i) validating the Qlisp primitives and compiler design,
(ii) demonstrating realistic applications where the high speed provided
by multi-processing can be put to good use.

Our areas of interest include algebraic manipulation programs (typified by
Macsyma) and other symbolic mathematical computations. These programs
have benefitted in the past by advances in computer hardware (large
address spaces, cheap memory, high speed), and should lead in the use of
non-numeric parallel processing. While past uses have been primarily in
applied mathematics and engineering, and occasionally in theorem proving,
number theory, and compiler theory, it appears that there is an increasing
interest in symbolic mathematics capabilities for computer-aided design
and AI applications.

Since important algorithms have already been refined over the past dozen
years or more in serial implementations, we have an opportunity to make
realistic judgments regarding the advantages of parallel programs.

While it is not possible to identify all opportunities for parallelism
at the outset, we can suggest the following directions:

\textindent{1.}
Exploit obvious cases of traditional parallel algorithms, e.g. FFT for
multiplying polynomials. The FFT in symbolic manipulation is done in a
finite computation structure (not complex floating-point numbers), but is
otherwise similar to the usual FFT.

\textindent{2.}
Use multiple processors to multiply two polynomials by divide and
conquer, or by farming out partial products. Combining partial products by
addition is tricky, and hash-coding by exponents (to coalesce
corresponding coefficients) may provide a neat application.

\textindent{3.}
Examine multiple-homomorphism (mod Q) algorithms such as polynomial
GCD, trying to do several moduli at the same time, for example. The GCD
problem has been well studied, but GCD computation can nevertheless occupy
large amounts of time in certain common manipulations (e.g. addition of
rational functions.).

\textindent{4.}
In factoring polynomials over the integers, several moduli generally
used in a first step (factoring polynomials over several finite fields).
This could clearly be done in parallel, although this is not usually the
expensive part of the algorithm. Combinatorial matching of factors of
different degrees is much more time-consuming; an extravagant number of
moduli can sometimes simplify the matching. Alternatively, the matching
process could be done in parallel.

\textindent{5.}
Linear algebra algorithms: some classical calculations have been shown
to run relatively faster with symbolic entries, using methods that are
highly appropriate for parallel calculation, even though the same methods
are terrible for floating-point. The best known example is the
computation of matrix determinants for which expansion by minors is
believed superior to Gaussian elimination. Actually implementing this on
a multiprocessor should make good use of division of labor.

\textindent{6.}
General simplification of expressions represented in trees: to simplify
$f(a,b,c,\ldots )$ one simplifies $a,b,c,\ldots$. to $a', b', c', \ldots$
and then simplifies $f(a', b' ,\ldots)$.
Two improvements are obvious: simplify the arguments in parallel;
sort the arguments by a parallel sort (e.g. if $f$ is $+$, or some
other commutative operation).

\textindent{7.}
Pattern matching drives some algorithms (e.g. indefinite integration).
Heuristic pattern matching with back-tracking may become more practical
with parallel processing, and should perhaps be revived as a more
fundamental control structure in some of the problem-solving programs.

\textindent{8.}
Parallel garbage collection of cons cells or other (much larger) data
objects constructed by algebraic manipulation might be worth exploring.

\textindent{9.}
We have experimented with frame-based objects with multiple
representations: we could compute simultaneously with different forms
(e.g.
polynomials:[expanded, recursive, factored],
curves:[parametric, point-set, constraints],
matrices:[dense, \break sparse, factored])

\lheader{Resources needed:}

We assume there will be an implementation effort at Stanford which
will produce a Qlisp simulation and an actual Qlisp implementation
on a parallel architecture machine.

The simulation would be available for use on hardware already in place
at UCB (including VAX, SUN, Symbolics, or Xerox), but the actual
parallel processor would be at Stanford. Substantial computing would
be required at UCB for program preparation, testing, communication,
etc.

\lheader{Reference:}

{\bf Gabriel, Richard and John McCarthy (1984)}:
``Queue-based Multi-processing Lisp''
in {\it Proceedings of the 1984 ACM Symposium on Lisp and Functional Programming},
August 1984.

\nextpage
% Appendix B -- separately TEXed QLISP.TEX
\bye